Near-Optimal Straggler Mitigation for Distributed Gradient Methods

机译：分布式梯度法的近似straggler缓解

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Modern learning algorithms use gradient descent updates to train inferentialmodels that best explain data. Scaling these approaches to massive data sizesrequires proper distributed gradient descent schemes where distributed workernodes compute partial gradients based on their partial and local data sets, andsend the results to a master node where all the computations are aggregatedinto a full gradient and the learning model is updated. However, a majorperformance bottleneck that arises is that some of the worker nodes may runslow. These nodes a.k.a. stragglers can significantly slow down computation asthe slowest node may dictate the overall computational time. We propose adistributed computing scheme, called Batched Coupon's Collector (BCC) toalleviate the effect of stragglers in gradient methods. We prove that our BCCscheme is robust to a near optimal number of random stragglers. We alsoempirically demonstrate that our proposed BCC scheme reduces the run-time by upto 85.4% over Amazon EC2 clusters when compared with other straggler mitigationstrategies. We also generalize the proposed BCC scheme to minimize thecompletion time when implementing gradient descent-based algorithms overheterogeneous worker nodes.

机译：现代学习算法使用梯度下降更新来训练最能解释数据的推理模型。将这些方法缩放到海量数据规模需要适当的分布式梯度下降方案，其中分布式工作节点基于其局部和局部数据集计算局部梯度，并将结果发送到主节点，在该主节点中所有计算都汇总为一个完整的梯度并更新学习模型。但是，出现的主要性能瓶颈是某些工作程序节点可能运行不足。由于最慢的节点可能决定了整个计算时间，因此这些节点（也称为stragglers）会大大减慢计算速度。我们提出了一种称为批处理优惠券收集器（BCC）的分布式计算方案，以减轻散乱者在梯度方法中的影响。我们证明了我们的密件抄送方案对几乎最佳数量的随机散乱者具有鲁棒性。我们还凭经验证明，与其他散漫性缓解策略相比，我们提出的BCC方案与Amazon EC2集群相比，可将运行时间减少多达85.4％。我们还概括了提出的BCC方案，以在实现基于梯度下降算法的异构节点上的完成时间最小化。

著录项

作者
Li, Songze; Kalan, Seyed Mohammadreza Mousavi; Avestimehr, A. Salman; Soltanolkotabi, Mahdi;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Straggler Mitigation With Tiered Gradient Codes [J] . Sasi Shanuja, Lalitha V., Aggarwal Vaneet, IEEE Transactions on Communications . 2020,第8期

机译：用分层梯度码施加落下缓解
2. Hone: Mitigating Stragglers in Distributed Stream Processing With Tuple Scheduling [J] . Li Wenxin, Liu Duowen, Chen Kai, IEEE Transactions on Parallel and Distributed Systems . 2021,第8期

机译：磨练：通过元组调度缓解分布式流处理中的陷阱器
3. BOA: batch orchestration algorithm for straggler mitigation of distributed DL training in heterogeneous GPU cluster [J] . Yang Eunju, Kang Dong-Ki, Youn Chan-Hyun Journal of supercomputing . 2020,第1期

机译：BOA：批处理编排算法，用于减轻异构GPU集群中分布式DL训练的拖累
4. Near-Optimal Straggler Mitigation for Distributed Gradient Methods [C] . Songze Li, Seyed Mohammadreza Mousavi Kalan, A. Salman Avestimehr, IEEE International Parallel and Distributed Processing Symposium Workshops . 2018

机译：分布式梯度法的近乎理想的Straggler缓解
5. THE NUMERICAL OPTIMIZATION OF DISTRIBUTED PARAMETER SYSTEMS BY GRADIENT METHODS [D] . CORNICK, DOUGLAS EDWARD. 1970

机译：梯度方法分布式参数系统的数值优化
6. Straggler-Aware Distributed Learning: Communication–Computation Latency Trade-Off [O] . Emre Ozfatura, Sennur Ulukus, Deniz Gündüz 2020

机译：Straggler-Aware分布式学习：通信 - 计算延迟权衡
7. Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers [O] . Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran 2019

机译：基于块设计的梯度编码，用于减轻对抗性胎面陷阱
8. The Numerical Optimization of Distributed Parameter Systems by Gradient Methods [R] . Cornick, D. E., Michel, A. N. 1972

机译：基于梯度法的分布参数系统数值优化

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅